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Figure l . The TMS320C40 will shatter your perceptions about high 
performance 


TI has disclosed the world’s first 
digital signal processor designed 
specifically for parallel processing. 
The TMS320C40 parallel digital 
signal processing (pDSP) chip, of" 
fers designers unprecedented sys" 
tem performance by allowing easy 
connection of a virtually unlimited" 
number of TMS320C40s. 

The ’C40 features six com" 
munication ports for direct connec" 
tion between processors. An on" 
chip DMA coprocessor supports 
interprocessor communication con" 
current with CPU calculations. 
With its high"performance 32"bit 
CPU and development tools 
designed specifically for easy 
development of pDSP applications, 
the ’C40 is an excellent building 
block for the performance"hungry 
applications of the 1990s. 

Although single"chip DSPs con" 
tinue to increase in performance, 
many designers are turning to mul- 
tiple DSP solutions to obtain high 
performance levels. An implemen" 
tation with a DSP optimized for 
parallel processing will yield still 
higher performance. The ’C40 is 
the highest performance 32"bit 
floatingpoint DSP available 
today—operating at 275 million 
operations per second (MOPs) and 
transferring data at 320 million 
bytes per second (Mbytes/sec)— 
with a 40-ns cycle time. 

TMS320C5x named finalist 
Innovation of the Year 

EDN Magazine has selected the 
TMS320C5x DSP as a finalist in 
the EDN Innovation of the Year 
Awards competition, as announced 
in the September 3rd edition of 
EDN. This issue also includes a 
reader survey which will determine 
the winner of the award. 

Factors contributing to the ’C5x’s 
innovativeness include its high 
level of system integration, large 
on"chip integration (12K of 


Parallel processing is being 
embraced to answer the needs of 
computationally"intensive applica" 
tions such as 3D graphics, image 
compression, decompression and 


memory and 1.2 million transis" 
tors), and adoption of the IEEE 
JTAG test/emulation standard. In" 
novations like JTAG and ad" 
vanced on"chip test/emulation 
hardware have enabled leading" 
edge development tools such as 
the ’C5x scan"based emulator and 
high-level language debugger, a 
user interface that allows trace and 
breakpoints while viewing running 
(Continued on page 3.) 


identification, neural networks, 
robotics, and speech recognition. 

The ’C40 not only offers the 
industry’s highest "performance 
(Continued on page 2.) 

Spectron announces OSPA 
for TMS320 DSPs 

Spectron MicroSystems has an" 
nounced the Open Signal Process- 
ing Architecture (OSPA)™ for the 
TI TMS320 floating-point DSPs. 
OSPA provides a common set of 
interfaces and protocols that 
facilitate integration of DSP 
hardware and software with end- 
user application programs execut¬ 
ing on standard host computer 
systems. Based on SPOX™, the 
(Continued on page 4.) 
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TMS320C40: Performance tailored for your application 

Does your application require more per- 
formance than you can get from a 
single DSP solution? The ’C40 is a 
single-chip building block that will 
allow users to get the performance they 
need. With the ’C40’s communication 
ports and support for shared global 
memory, ’C40 processors can be con¬ 
nected in a variety of configurations. 

These ports, along with the ’C40’s mul¬ 
tichannel DMA coprocessor and high- 
performance 32-bit CPU, provide the 
ingredients to build the supersystems 
you have envisioned. 

The ’C40 is the first true pDSP build¬ 
ing block with communication ports to 
support interprocessor communications 
and global and local buses to support ex¬ 
ternal memory in parallel systems. 

A virtually unlimited variety of 
pDSP configurations can be achieved 
by connecting the pins on the com¬ 
munication ports without any external 
logic. You can construct a 2-D mesh sys¬ 
tem for image processing, or build a 
tree structure for speech recognition al¬ 
gorithms. Bidirectional rings with for- 



Figure 3 . The , C40 , s flexible architecture enables unlimited configurations and unlimited 
performance 


backward propagation are perfect for 
neural networks, or choose a pipelined 
linear array for convolution, correla¬ 
tion, and complex arithmetic required 
by radar applications. 


The sky is the limit with the 
flexibility and performance of the ’C40. 
With TPs pDSP software and develop¬ 
ment tools, you can design a high-per¬ 
formance pDSP system to meet your 
application’s performance needs, 


New optimization capability in 'C30 C compiler 

Contributed by Alan Davis 


EDN (Cont. from page I ). 
assembly and C languages simultaneous¬ 
ly. Finally, the ’C5x’s modular architec¬ 
ture promises rapid, application-specific 
spin-offs in the future. 

The TMS320C51 has already begun 
sampling and has been well-received, 
in telecom, computer, and control ap¬ 
plications. Development software is 
available today, as is the ’C5x Software 
Development System (SWDS), a PC- 
resident board that includes a ’C51 
device and allows full-speed execution 
of code. The ’C5x SWDS features the 
high-level language debugger for fast 
time to market. Third Party tools are 
also available.. 

To receive a copy of the 
TMS320C5x User’s Guide (SPRU056), 
call the TI Customer Response Center 
at (800) 232-3200, x3510.g$> 


Release 4-00 of the TMS320C30 ANSI 
C compiler contains advanced new op¬ 
timization capabilities that can dramati¬ 
cally improve the performance of DSP 
applications written in C. The new 
compiler applies dozens of general op¬ 
timization techniques while taking ad¬ 
vantage of the ’C30’s highly parallel 
architecture, such as allocating vari¬ 
ables into registers, simplifying state¬ 
ments and expressions, and rearranging 
loops. The new optimizations often 
produce speed improvements of 15- 
25% on general purpose code and up to 
1000% (lOx) on loops. In addition, the 
new compiler reduces the memory re¬ 
quirements of most programs. 

One of the most important con¬ 
siderations in compiling for a DSP is 
register use. The ’C30 compiler maxi¬ 
mizes the use of registers for storage of 
local variables, parameters, and tem¬ 
poraries. By analyzing the lifetime of 
variables and using a weighted cost al¬ 
gorithm, the compiler can optimally 
map the set of variables into the ’C30 
register set. Often, a single register can 
be used for several variables whose uses 
do not overlap. 


Many DSP applications are very 
loop-intensive. The optimizer recog¬ 
nizes loops and rewrites them to ex¬ 
ecute efficiently on the ’C30. For ex¬ 
ample, a dot-product program that 
loops through the elements of two ar¬ 
rays and sums their products is written 
to use the ’C30’s autoincrement address¬ 
ing and repeat-block capabilities, so 
that the body of the loop is only two in¬ 
structions long. 

The ’C30 compiler also incorporates 
numerous other state-of-the-art op¬ 
timizations. These include common sub¬ 
expression elimination, copy propaga¬ 
tion, inline expansion of runtime 
support functions, strength reduction 
and many more. Some of the ’C30- 
specific features that the compiler can 
use are parallel instructions, delayed 
branches, and zero-overhead loops. For 
the first time, the performance of com¬ 
piled code can approach that of hand- 
coded assembly for even the most 
demanding DSP algorithms. 

Release 4.00 is available now. Free 
updates will be shipped to registered 
users. 
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DESIGN TIPS 


TMS320C30 FFT routine 

Contributed by Donald G . Chandler and 
Nelson L. Chang , Omniplanar, Inc . 

In Appendix C2 of Volume 3, Digital 
Signal Applications with the TMS320 
Family, (pp 112-113), a C callable 
radix-2 real FFT program for the 
TMS320C30 is provided. The bit 
reversal section relies heavily on TI’s 
bit-reversal addressing mode (e.g. 
*ARn++(IR0)B). The base address 
must start at an address where the last 
m binary digits are zero (m is the log 
(base 2) of the FFT size). For example, 
if the FFT size is 64, the base address 
must start at xxxxxxxxxxOOOOOO. 

However, invoking the FFT sub¬ 
routine with a data set from a C pro¬ 
gram leads to unpredictable results; it 


We're Glad You Asked 

Q: What do you recommend I use as 
decoupling capacitors for the 
TMS320C25? 

A: We recommend that you put a 
large capacitor (22uF) at the point the 
power comes onto the board and a 
small capacitor (.OluF to .luF) at each 
power pin on the ’C25. This applies to 
all TMS320 devices. 

Q: What is the difference between the 
single-access and dual-access RAM on 
the TMS320C5x? 


depends on where the linker has placed 
the data set in memory. One cannot 
easily correct this situation from C. We 
solved this problem with some minor 
modifications to the original assembly 
bit reversal code, as shown in Figure 5. 

In this code, the lines ending with 
comments are modifications of the 
original code. We first set the two 
registers ARO and AR1 to zero. This 
ensures that the bit addressing will 
work. We then define the index 
register IR1 to contain the address of 
the input data (i.e., an offset from zero). 
Note that we then manipulate the 
pointers at ARO and AR1 on the fly; 
every time we need to load or store we 
pre-increment with the offset. These 
modifications do not increase the code 
execution time, 


A: The single-access RAM looks like 
the zero wait state external RAM to the 
CPU—one read or write per instruction 
cycle. Dual-access RAM allows a read 
and write in a single instruction cycle. 

Q: Can I configure the TMS320C50’s 
9K of single-access RAM to both data 
and program (for example, 4K data and 
5K program)? 

A: Yes, you can enable OVLY and 
RAM bit in the status register. You 
enable the RAM to be both data and 
program. 


Using the new fixed-point DSP 
macro capability 

The enhanced macro capability intro¬ 
duced as part of Version 6.00 of the 
TMS320 Fixed-Point (’Clx/C2x/C5x) 
DSP assembler, is based on string sub¬ 
stitution. The macro functionality 
defines a “substitution symbol” which 
represents a character string. When 
the assembler encounters one of these 
symbols it “substitutes” the character 
string for the symbol. Substitution sym¬ 
bols can be used anywhere within a 
source module and to represent macro 
parameters. 

Built-in functions enable developers 
to make decisions based on the string 
value of substitution symbols. These 
functions always return a value, and are 
most useful in logical expressions. 

The example in Figure 6 is a “smart” 
context switch macro. The input to 
the macro is a list of items that should 
be saved by the context switch. The 
built-in function $ISMEMBER(ITEM, 
PARMS) removes the first parameter 
from the list and assigns it to ITEM. 
$SYMLEN() returns the length of a 
substitution symbol, and $SYMCMP() 
compares the string values of two char¬ 
acter strings, 


* CONTEXT SAVE ON SUBROUTINE CALL OR 

INTERRUPT 

* ASSUMES AR7 IS THE STACK POINTER 

* PARMS-LIST OF ITEMS THAT ARE SAVED 

* ACC, ST, P, T 

* EXAMPLE SWITCH ACC.P.T ; SAVE ACCUMULATOR, 

T, AND P REGS. 

SWITCH .MACRO 

PARMS 

.VAR 

ITEM 

.IF 

$SYMLEN(PARMS) = 0 

.MEXIT 


.ENDIF 


LARP 

AR7 ; SET UP SP 

MAR 


.LOOP 


.BREAK 

$ISMEMBER (ITEM, PARMS) =0 

.IF 

$SYMCMP (ITEM, "ACC") = 0 

SACH 

*. 

SACL 


.ELSE1F 

$SYMCMP(ITEM, "ST") =0 

SSTI 


SST 


.ELSEIF 

$SYMCMP(ITEM, "P") = 0 

SPM 

0 ; NO SHIFT ON PR OUTPUT 

SPH 

*, 

SPL 

*. 

.ELSEIF 

$SYMCMP(ITEM, "T") = 0 

MPYK 

1 

SPL 

+, 

.ENDIF 


.ENDLOOP 


.ENDM 



Figure 6 . Smart context siuitch macro 



LDI 

@FFTSIZ,RC 



SUBI 

1,RC 



LDI 

@FFTSIZ,1R0 



LSH 

-1,IR0 



SUBI 

R0.R0 

;define rO to be zero 


LDI 

R0.AR0 

;set ARO to be zero 


LDI 

R0,AR1 

;set AR1 to be zero 


LDI 

©INPUT,IR1 

;define IR1 to have the proper offset 


RPTB 

BITRV 



CMPI 

AR1.AR0 



BGE 

CONT 



LDF 

*+AR0(lRI),R0 

jpre-increment the index to ARO 

II 

LDF 

*+ARl(IRl),Rl 

;pre-increment the index to AR1 


STF 

RO,*+ARl(IRl) 

:pre-increment the index to AR1 

II 

STF 

R1,*+AR0(IR1) 

;pre-incremcnt the index to ARO 


CONT 

NOP*ARO++ 



BITRV 

NOP*ARl++(IR0)B 


Figure 5. 

•C30 bit 

reversal routine: fftjrl.asm * 
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